Constrained Policy Optimization
نویسندگان
چکیده
For many applications of reinforcement learning it can be more convenient to specify both a reward function and constraints, rather than trying to design behavior through the reward function. For example, systems that physically interact with or around humans should satisfy safety constraints. Recent advances in policy search algorithms (Mnih et al., 2016; Schulman et al., 2015; Lillicrap et al., 2016; Levine et al., 2016) have enabled new capabilities in highdimensional control, but do not consider the constrained setting. We propose Constrained Policy Optimization (CPO), the first general-purpose policy search algorithm for constrained reinforcement learning with guarantees for near-constraint satisfaction at each iteration. Our method allows us to train neural network policies for high-dimensional control while making guarantees about policy behavior all throughout training. Our guarantees are based on a new theoretical result, which is of independent interest: we prove a bound relating the expected returns of two policies to an average divergence between them. We demonstrate the effectiveness of our approach on simulated robot locomotion tasks where the agent must satisfy constraints motivated by safety.
منابع مشابه
Stock Portfolio-Optimization Model by Mean-Semi-Variance Approach Using of Firefly Algorithm and Imperialist Competitive Algorithm
Selecting approaches with appropriate accuracy and suitable speed for the purpose of making decision is one of the managers’ challenges. Also investing decision is one of the main decisions of managers and it can be referred to securities transaction in financial markets which is one of the investments approaches. When some assets and barriers of real world have been considered, optimization of...
متن کاملA chance-constrained multi-objective model for final assembly scheduling in ATO systems with uncertain sub-assembly availability
A chance-constraint multi-objective model under uncertainty in the availability of subassemblies is proposed for scheduling in ATO systems. The on-time delivery of customer orders as well as reducing the company's cost is crucial; therefore, a three-objective model is proposed including the minimization of1) overtime, idletime, change-over, and setup costs, 2) total dispersion of items’ deliver...
متن کاملOn the hybrid conjugate gradient method for solving fuzzy optimization problem
In this paper we consider a constrained optimization problem where the objectives are fuzzy functions (fuzzy-valued functions). Fuzzy constrained Optimization (FO) problem plays an important role in many fields, including mathematics, engineering, statistics and so on. In the other side, in the real situations, it is important to know how may obtain its numerical solution of a given interesting...
متن کاملResource Allocation and Multiagent Policy Formulation for Resource-Limited Agents Under Uncertainty
The problem of optimal policy formulation for teams of resourcelimited agents in stochastic environments is composed of two strongly coupled subproblems: a resource allocation problem and a policy optimization problem, both of which have individually received significant amount of attention. We show how to combine the two problems into a single constrained optimization problem that yields optim...
متن کاملQuasi-Newton Methods for Nonconvex Constrained Multiobjective Optimization
Here, a quasi-Newton algorithm for constrained multiobjective optimization is proposed. Under suitable assumptions, global convergence of the algorithm is established.
متن کاملConstrained Markov Decision Models with Weighted Discounted Rewards
This paper deals with constrained optimization of Markov Decision Processes. Both objective function and constraints are sums of standard discounted rewards, but each with a diierent discount factor. Such models arise, e.g. in production and in applications involving multiple time scales. We prove that if a feasible policy exists, then there exists an optimal policy which is (i) stationary (non...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017